1 Terms of re-use

1.1 License

CC-BY-SA unless otherwise noted.

1.2 Citation

2 Purpose

To extract and visualise tweets and re-tweets of #birdoftheyear OR #boty (see https://twitter.com/hashtag/birdoftheyear and the Forest & Bird voting site).

Borrowing extensively from https://github.com/mkearney/rtweet

The analysis used rtweet to ask the Twitter search API to extract ‘all’ tweets containing the #birdoftheyear OR #boty hashtags in the ‘recent’ twitterVerse.

It is therefore possible that not quite all tweets have been extracted although it seems likely that we have captured most recent human tweeting which was the main intention. Future work should instead use the Twitter streaming API.

## [1] "Found 20 files matching #birdoftheyear OR #boty in ~/Data/twitter/"

The data has:

3 Analysis

3.1 Tweets and Tweeters over time

Number of tweets and tweeters

Figure 3.1: Number of tweets and tweeters

Figure 3.1 shows the number of tweets and tweeters in the data extract by day. The quotes, tweets and re-tweets have been separated. Looks to me like there’s quite a lot of activity at weekends…

If you are in New Zealand and you are wondering why there are no tweets today (2018-10-04) the answer is that twitter data (and these plots) are working in UTC and (y)our today hasn’t started yet in UTC - but don’t worry, all the tweets are here. It’s just our old friend the timezone… :-)

3.2 Who’s tweeting?

Next we’ll try by screen name.

N tweets per day by screen name

Figure 3.2: N tweets per day by screen name

Figure 3.2 is a really bad visualisation of all tweeters tweeting over time. Each row of pixels is a tweeter (the names are probably illegible) and a green dot indicates a few tweets in the given day while a red dot indicates a lot of tweets.

So let’s re-do that for the top 50 tweeters so we can see their tweetStreaks (tm)…

Top tweeters:

Table 3.1: Top 15 tweeters (all days)
screen_name nTweets
birdoftheyear 262
Forest_and_Bird 102
testeeves 98
vote4kaki 78
NatForsdick 70
coolbiRdpics 66
mifflangstone 58
jackcraw57 50
freshwaterfelix 44
thebushline 42
kiwilullaby 40
64by4 36
newzealandbirds 35
sgalla32 35
hugobrown 34

And their tweetStreaks are shown in Figure 3.3

N tweets per day minutes by screen name (top 50, reverse alphabetical)

Figure 3.3: N tweets per day minutes by screen name (top 50, reverse alphabetical)

Any twitterBots…?

3.3 Which birds are mentioned the most (by hashtag)

This is very quick and dirty but… Table 3.2 shows the total count of each #hashtag by (re)tweet type. With thanks to David Hood for code to help make sure that kakī == kaki.

Table 3.2: Top 20 hashtags in tweets containing #birdoftheyear or #boty
htClean ba_tweetType count
birdoftheyear Re-tweet 2715
birdoftheyear Tweet 1651
takayay Re-tweet 507
birdoftheyear Quote 441
teamkaki Re-tweet 170
kaki Re-tweet 169
boty Re-tweet 159
dammitgannet Re-tweet 143
boty Tweet 122
kereru Re-tweet 119
vote4kaki Re-tweet 82
dammitgannet Tweet 82
voteruru Re-tweet 75
aotearoa Re-tweet 65
teamrockhopper Tweet 64
teamkaki Tweet 62
greatkererucount Re-tweet 62
nativebird Re-tweet 62
woodpidgeon Re-tweet 61
votebittern Tweet 53

Figure 4.1 plots the daily occurence of these hashtags after removing variants of #birdOfTheYear and #boty and selecting only those which have more than 10 mentions on any day. For clarity we have not separated tweets from re-tweets. See Section 6 for the problems with this approach.

4 YMMV.

Most mentioned #hashtags per day

Figure 4.1: Most mentioned #hashtags per day

5 So who’s gonna win?

No idea.

There are a lot of problems with this approach (see Section 6) but if the hashtags have any predictive value at all then Figure 5.1 should be an indicator of the direction of travel (watch for lines of apparently dis-similar hashtags where the macron fix has failed) and 5.2 shows the totals to date.

Figure 5.1 uses plotly to avoid having to render a large legend - just hover over the lines to see who is who…

Figure 5.1: Cumulative hashtag counts over time

Total hashtag counts to date

Figure 5.2: Total hashtag counts to date

6 Problems with counting #hashtags

Loads of them. But primarily:

=> this is a really imperfect measure.

#YMMV

7 About

Analysis completed in 50.32 seconds ( 0.84 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-apple-darwin15.6.0.

A special mention must go to https://github.com/mkearney/rtweet (Kearney 2018) for the twitter API interaction functions.

Other R packages used:

References

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Kearney, Michael W. 2018. Rtweet: Collecting Twitter Data. https://cran.r-project.org/package=rtweet.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.

———. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016a. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.

———. 2016b. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.